Integration and Development of Enterprise Internal Audit and Big Data Based on Data Mining Technology

unrestricted


Introduction
e computerization of accounting and the computerization of business data had begun to be widely used in many units. e sampling audit of financial and commercial data by the audited unit is the basis of daily audit work. According to the information about data, it becomes important to find potentially useful information from a large amount of disorderly data [1,2]. erefore, finding the really valuable things from these huge data and providing clues and basic methods for the inspectors to discover problems is a relatively urgent issue for the audit departments of many companies.
Compared with previous audits, big data audits have their own characteristics, which can greatly improve the management of enterprises. erefore, big data auditing is a new trend in the development of internal auditing in Chinese enterprises. In addition, the status quo, improvement plan, and business improvement of some companies in the big data audit were investigated, and it was found that the implementation of the big data audited system is regular, and it is recommended that it can be tracked on the basis of subsequent implementation.
is article proposes a new method of audit data analysis based on big data mining to innovate the audit method. e application of this new method has important reference significance for the internal audit of other enterprises.
Data mining refers to the process of searching for information hidden in a large amount of data through algorithms. Data mining is generally related to computer science and achieves these goals through a number of methods such as statistics, online analytical processing, intelligence retrieval, machine learning, expert systems (relying on past rules of thumb), and pattern recognition. is article integrates and develops enterprise internal audit and big data by using data mining technology, which is an efficient, convenient, and feasible audit method. e rest of the article is organized as follows: Section 2 details the related work, while Sections 3 and 4 throw light on the theoretical method and experimental simulation and analysis, respectively. Similarly, Section 5 discusses internal audit analysis, and Section 6 is the final conclusion of the article. e innovation of this article is to put forward a big data audit system through the understanding of the company's internal audit and a comprehensive analysis of its improvement process, and to improve the clustering algorithm-key data mining algorithm-so that it can be more suitable for the research topic of this article.

Related Work
With the rapid development of massive data, more and more data are stored, and humans need to find a more convenient way to obtain data. Buczak proposed a method for data analysis and mining, and also gave a detailed description of the specific process [3]. Xu pointed out that the preservation of data security is a very important issue. He said that it needs to be continuously studied to ensure the security of data [4]. Kavakiotis proposed to apply the method of data mining and machine learning to the treatment of diabetes [5]. Chaurasia studied the performance of different classification techniques, using classification accuracy to test a total of 683 rows and 10 columns of breast cancer data. e purpose is to use data mining technology to develop an accurate breast cancer prediction model [6]. Yan used data mining technology to evaluate data, which is not applicable, thinking that there is still a lot of data that need to be predicted by itself [7]. Emoto observed and analyzed the characteristics of the gut microbiota of patients with coronary artery disease using data mining technology [8]. Hong proposed to use data mining technology to effectively prevent the flood problem in Poyang Lake [9]. Huang aimed to provide an effective method to calculate a rough approximation of fuzzy concepts in a dynamic fuzzy decision system (FDS), where objects and features change at the same time [10]. Based on big data technology, Zhao extended and changed the traditional model according to the characteristics of data mining services and proposed a big data alliance data mining service process model. In addition, Zhao uses intelligent decisionmaking theory and knowledge reasoning methods to build a fast-response, reusable, and intelligent service model to realize the scalability of data mining services [11]. Liu analyzed and explored emerging ideas and methods of data mining techniques, and conducted audits to evaluate the evolution of these techniques. It can be seen that the combination of big data technology and industrial green manufacturing technology is slow, and it is necessary to combine industrial green manufacturing enterprises with big data technology and artificial intelligence. In order to improve the current severe environmental problems, Liu also discussed the development trend in the green technology production based on big data technology and the integration and innovation of big data technology and green technology, enriching the forms of environmental supervision and participation and other technological progress and improving business efficiency. As a result of the progress of green technology based on big data, environmental audit strategies and suggestions are put forward [12]. e abovementioned documents have a very detailed description of some key technical points and a good demonstration of the related design process of data mining. However, looking at these several documents, there is no inquiry into the mining ability of data mining, there is no experimental design for the stability of data mining, and there are still some deficiencies.

Big Data Audit
System. Enhancing the implementation of quality control is an important measure to control audit risks. Intensified implementation of quality control can effectively supervise the full implementation of audit work and audit review.
3.1.1. Strengthening the Training of Practitioners. On the one hand, it is aimed at the training of high-level management personnel, strengthening their attention to quality control and their understanding of the quality control process, and fundamentally optimizing the execution environment of quality control. On the other hand, it is to train personnel to strengthen their quality control awareness and make them consciously abide by the quality control system. Only when the quality control awareness of managers and auditors are improved, the quality control system can be enforced strongly, the audit work can be carried out in accordance with the quality control system in the entire audit process, and the audit quality can be improved and audit risk can be reduced.

Strengthening the Operation Supervision of Quality
Control. An independent department should be set up to supervise and inspect all accounting and auditing departments, and an effective supervision and inspection mechanism should be established. In the business undertaking stage, we supervise and inspect whether we have a detailed understanding of the customer and the former CPA, whether the forms in the business undertaking stage are true, etc.; and formulate corresponding punishment measures, such as public criticism, fines, downgrading, dismissal, etc.
When it is discovered that the corresponding certified public accountants and auditors have failed to implement the quality control system, they will be punished in varying degrees according to the severity of the circumstances. While achieving the disciplinary effect, other auditors are warned to implement the quality control system in accordance with the regulations [13]. rough such an independent department, problems in the work process can be discovered in time, and risks can be prevented and resolved in time, so that the staff consciously abide by the quality control system, which is conducive to the implementation of quality control.

Building a Big Data Audit
System. When enterprises are faced with massive data audit projects, traditional equipment and data collection methods face problems such as high resource consumption and slow data processing and analysis [14,15]. For this reason, this article proposes a big data audit system, build a big data audit system including infrastructure layer, data layer, data analysis layer, and application layer, which is shown in Figure 1:

Improving Audit
Procedures. Improving the auditing procedures is the preliminary application of big data technology in auditing practice. e audit process is longer, the scope of the audit is wider, and there are many big data technologies. erefore, the application of big data technology in any step, as long as it is more efficient than the previous audit methods, is to improve the audit procedure [16]. is article synthesizes the existing research results and summarizes the audit procedures with high application degree of big data technology in Figure 2.
It can be seen from Figure 2 that the improvement in audit procedures mainly refers to data acquisition, data processing, and data analysis. RH accounting firm can select some applications and all applications according to the situation of the audited unit and its own audit needs [17]. e following is a simple illustration of data acquisition and data processing.
Data acquisition is shown in Figure 3. e acquired data include both internal data provided by the enterprise and external data (the acquired data include not only structured data from the data relational database, but also semi-structured data from web pages, XML, etc., as well as unstructured data such as office files, company reports, emails, pictures, audio and video, etc.). All this relies on nonrelational database NoSQL technology. As the audit evidence SQL, documents, pictures, audio and video, and other files cannot be stored in SQL, which brings great inconvenience to the work of auditors, so NoSQL technology came into being [18]. As the world's largest information retrieval company, Google has widely used NoSQL database systems.
In the context of big data, data processing can enhance the timeliness of audit work, and making good use of data processing can efficiently and quickly obtain audit evidence.
ere are two big data processing modes, namely stream processing and batch processing. At the same time, when cloud accounting is developed, real-time auditing will also be realized, and the development direction of auditing must be real-time auditing [19].

Mathematical Model of Cluster Analysis.
Cluster analysis is actually to analyze the distribution of the feature vectors corresponding to the samples in the entire X set and divide x 1 , x 2 , x n into several disjoint groups according to the degree of closeness between the samples.
Let X � {x 1 , x 2 , . . ., x n } be the domain of the data object to be analyzed (total). Each data object x k (k � 1, 2, . . ., n) is described by several commonly used parameter values, a parameter value describes an attribute of x k , and the following conditions must be met: can be expressed by the membership function as: Among them, the membership function must also meet the conditions: at is, it is equivalent to that all samples belong to only a certain cluster, and all subsets must be non-empty. Such cluster analysis is usually called hard partition.

Binary Variables.
Binary variables refer to clustered data objects with only two states of 0 and 1. For example, the variable 1 that describes the state of a thing means existence and 0 means non-existence. Only one of these two states can be selected, and the third state cannot exist. Binary variables can also be subdivided into two types: symmetric and asymmetric. Symmetrical binary variables indicate that the importance of different states of variables is not different, while asymmetrical binary variables set different weights for different states [20]. e simple matching coefficient can be expressed as: Similarly for asymmetric binary variables, the similarity of different variables is related to a coefficient named Jaccard. Supposing the values of p, q, r, and s are the same as above, then the Jaccard coefficient is as follows: where μ is the total number of attributes and m is the number of matching attributes in the data objects χ i and χ j .

Interval Scale Variables.
Interval scale variables can be defined as continuous measures with linear scales, including width and length, height and weight, air pressure and temperature, and so on. Before dividing data objects into different categories, it is necessary to define a measure of difference or similarity to measure the difference between different categories of data objects and the similarity of data objects in the same category. e usual method is to measure the distance between data objects. For two data objects with n-dimensional attributes, it can be expressed as:

Computational Intelligence and Neuroscience
For the distance d between two data objects, the main distance functions are: (1) Ming's Minkowski distance: When q takes 1, 2, and ∞, the Ming's distance can be expressed as: (i) Absolute distance: (ii) Euclidean distance (2) Mahalanobis distance e minute distance described above is only applicable to the usual Euclidean space. Considering that the attribute value of each variable of the data object is usually a random variable, because the random variable is released randomly, the various components may be correlated. erefore, the Mahalanobis distance between the ith sample and the jth sample can be expressed as: where Σ is the covariance matrix.
In clustering algorithms, distance is usually used as a very intuitive measure of difference, especially Euclidean distance, which is used in this article [21]. Here we briefly introduce two important similarity measure similarity coefficients. e similarity coefficient is between −1 and 1. e closer the coefficients are to ±1, the more similar they are.
(1) Cosine of Included Angle. In the cluster analysis, let the sample i in the p-dimensional space be: Sample j is: e cosine of the angle between two samples is used to express their similarity coefficient, and the cosine of the angle between the samples is recorded as: (2) Correlation Coefficient. e correlation coefficient of sample i and j can be recorded as: Among them x i and x j are the mean values, which are as follows:

Level-Based
Approach. e hierarchical clustering method is to decompose the data set into several groups (classes) to form a clustering tree. According to the clustering method, it can be divided into top-down split hierarchical clustering and bottom-up cohesive hierarchical clustering. Agglomerative hierarchical clustering is to initially treat each data object as a class, and then merge it level by level until it forms a set that cannot be merged. e split hierarchical clustering regards all data objects as one class, and then gradually splits according to the given rules, producing several subclasses, until it reaches the clustering. e following describes the processing process of agglomerative clustering and split clustering by simply clustering the data set (a, b, c, d, e) in Figure 4:

e Characteristics of Internal Audit Informationization under Big Data.
In the professional field of internal auditing, the "Global Technical Audit Guidelines" issued by the International Association of Internal Auditors (IIA) summarizes the connotation of "internal auditing informatization" (i.e., internal audit informationization mainly includes information technology vulnerability management, information technology audit, information technology control, etc.). In order to realize the internal audit function under the guidance of the enterprise development strategy, the internal audit department supervises, evaluates, and optimizes the enterprise's risk, control, and corporate governance [22]. Modern information technology is used to build a big data audit platform based on "cloud computing" to collect financial data and business data generated in the operation of the enterprise in real time and extensively.
Compared with internal audit in the traditional sense, the characteristics of information-based internal audit are as follows: diversification of audit content, digitization of audit objects, intelligence of audit management, and modernization of audit technology.

Risk-Oriented Concept.
In 2013, the International Institute of Internal Auditors (IIA) released a three-line defense model for effective risk management [23]. e first line of defense is business management and internal control, mainly for risks in business operations. e second line of defense includes financial control, quality control, etc. Its role is mainly to monitor the cost-effectiveness of business operations. In addition, the second line of defense also supervises the first line of defense to ensure its effectiveness. e third line of defense mainly refers to the internal audit, based on the first line of defense and the second line of defense, focusing on the loopholes and risk points in the company's operations, conducting key audits, and issuing audit results. e "three lines of defense model for effective risk management" is shown in Figure 5. e internal audit work is risk oriented, and a risk early warning system is established on the basis of the audit data warehouse. e system can automatically operate according to the program settings and issue early warning information in time, thereby reducing the risks in the business process. e functions of the risk early warning system are mainly realized through the following aspects: (1) Establishing a risk early warning model. e risk early warning model is based on risk early warning indicators and covers most of the risk points in the company's operations that the audit focuses on. e risk early warning model can automatically calculate and compare the data in the audit data warehouse, and then the auditor will further analyze the abnormal and fluctuating indicators to form a risk early warning report.
(2) e push and follow-up feedback mechanism of risk early warning reports. e audit department will promptly push the risk warning report to the responsible department and help the responsible department rectify and eliminate risks in a timely manner. In order to ensure the implementation of audit rectification opinions, the system will continue to monitor and feed back the implementation of the responsible department.
(3) Periodic reporting system to the management. e audit department will regularly form a special report on the results of risk early warning, the implementation of rectification of the responsible department, and the audit recommendations for improving risk management, which will be reported to the management to effectively improve the performance of the internal audit department and the status of the department.

Elements of Big Data
Audit. "Big Data Audit System" mainly covers audit data collection, audit data storage management, and audit business application modules. e audit business application modules are mainly composed of audit early warning modules, audit support modules, and information access modules. e general implementation mode of "Big Data Audit System" is shown in Figure 6. e main mode of operation of the big data audit system is as follows: first, the audit data warehouse regularly imports audit data from the BSS business system, ERP business system, network transportation system, and other application systems. e source data are processed by methods such as absorption, cleaning, and elimination, so that the data in the audit data warehouse meet the needs of the audit. Second, based on the audit data warehouse, the audit department has developed the application of the audit early warning system and the online monitoring system. ese applications make full use of high-tech data mining technology (e.g., statistical technology, artificial intelligence, and neural network) and can perform in-depth processing of the data in the audit data warehouse. Finally, the analysis results are presented to the auditors (through visualization technology, such as electronic forms and other information access tools), and the auditors draw audit conclusions based on this.

Basic Survey of Internal Audit.
Due to the large number of enterprises in China, there are many different types of enterprises; however, compared with other types of enterprises, listed companies have stronger profitability, larger scale, and larger number. erefore, when analyzing the status quo of internal audit of Chinese enterprises, we will focus on the consideration of listed companies in China. When analyzing the implementation of internal audits by listed companies in China, the relevant data are mainly obtained through a combination of databases and questionnaires (data come from Juchao Consulting Network, Guotaian Database, published statistical yearbooks, internal audit related systems of listed companies). e specific data are shown in Table 1.

Basic Overview of Internal Audit.
ere are many types of corporate internal audit services. Common audit services include financial audit, internal control audit, operation audit, special audit, and risk audit. Statistics on this aspect are shown in Table 2.
It can be seen from the above survey results that the company's internal audit department is involved in financial   Table 3.
Listed companies should pay attention to the training of internal auditors and raise the threshold for entering internal audit institutions.
is allows truly capable audit talents to join in and improve the comprehensive capabilities of the company's entire internal audit team.

BSS business system
Net Transport System database Extract Cleaning condition Weed out

Net Transport System
Extract Cleaning condition Weed out

ERP business system
Audit data storage management Audit data import Audit data collection Computational Intelligence and Neuroscience

Investigation on the Implementation of Internal Audit
Informatization. According to the survey feedback of 56 listed companies, there are 7 listed companies temporarily not considering the use of data mining technology in internal audit. ere are 16 listed companies that only consider the use of data mining technology for internal auditing but have not implemented it in the end, and 33 companies have considered using data mining technology for internal auditing and have implemented it. e specific data are shown in Table 4.
Among the 23 listed companies that have not implemented internal audit and use data mining technology, 4 listed companies believe that they have audit capabilities and do not need to rely on the power of data mining technology. Six listed companies believe that there are certain risks in the use of data mining technology in internal auditing. e relevant data are shown in Table 5.
rough investigation, more than 50% of listed companies finally implemented internal audit information. However, there are still some companies that do not have the idea of using data mining technology for internal audit or do not consider using data mining technology at all. is shows to a large extent that Chinese enterprises have less consideration of various factors that affect the informationization of internal auditing. In the specific process of implementing informatization, there is a lack of specific implementation paths and management rules to guide and regulate.

e Overall Level of Internal Audit Resource Integration.
At present, the resource integration implemented in the corporate internal audit practice is often implemented only in fragmented organizational methods. It does not treat this process as a systematic work. Its main purpose is to make up for the shortcomings of certain types of resources in the execution of internal audit projects. is method of resource integration can indeed achieve the necessary integration of certain audit resources that were originally independent. However, it lacks a holistic view, it is difficult to pass on good experience, it is difficult to give feedback on existing problems, and it is difficult to preserve the complete information in the process. As a whole, it is difficult to produce long-term significance for improving the level of internal audit management. e internal audit work of the abovementioned investigation companies maintains a high degree of independence, and the acquisition of company-related information is generally smooth and comprehensive. However, there are still many problems to be solved in practical work. e degree of integration of various elements of internal audit resources is measured in three levels: low (1-10), medium (11)(12)(13)(14)(15)(16)(17)(18)(19)(20), and high (21)(22)(23)(24)(25)(26)(27)(28)(29)(30). e basic situation of the internal audit resource integration of companies that did not use big data for auditing and audits that use big data in the interview survey can be reflected as shown in Figure 7: As can be seen from the figure, through the application of data mining in big data, although all aspects of internal audit have been optimized but however due to the fact that the structure of the audit team has just been adjusted, there are still unstable factors such as insufficient audit     Computational Intelligence and Neuroscience experience of some personnel. And because the newly transferred internal auditors may not be able to quickly and independently carry out large-scale audit projects. is will affect the allocation of human resources in different audit projects to a certain extent. In addition, in recent years of internal audit work, due to the large scale, complex nature, and wide range of professional aspects of some audit projects, the audit department has adopted the method of seconding professionals from the corporate finance department to complete the audit tasks. Although the final audit effect has been affirmed by the company, the lack of a mature secondment mechanism and the temporary establishment of audit project teams may affect the original work of other departments. ese contradictions have yet to be resolved.

Accuracy Analysis of Clustering Algorithm.
In order to verify the performance of the improved algorithm, this article uses four data sets from the UCI machine learning database to conduct experiments. At the same time, the algorithm was run 10 times on each data set to record the average value and compared with the traditional K-means clustering algorithm. e result is shown in Figure 8: From the above four sets of experimental data, we can see that the method proposed in this article has a higher accuracy rate. Its accuracy is greatly improved compared with the traditional K-Means clustering algorithm. At the same time, in order to compare the ability of the two algorithms to obtain the best number of clustering categories, this study runs 50 times on each of the four data sets. e result is shown in Figure 9.        e data in the figure firstly express such a fact very intuitively: e clustering algorithm proposed in this article is overwhelmingly stronger than the traditional clustering algorithm in obtaining the best clustering category. is benefits from the powerful global optimization capability of the algorithm. e inertia adjustment coefficient h also affects the accuracy of the algorithm, and a proper inertia adjustment coefficient can effectively correct the inertia coefficient of the particles. is can give full play to the characteristics required by the particles at this time. In order to obtain the best inertia adjustment coefficient, this article selects a set of continuous inertia coefficients. It clusters on four data sets and plots the results as shown in Figure 10.
It is easy to know from the figure that on different data sets, the law of accuracy changes with the inertia adjustment coefficient is different, and their best inertia adjustment coefficients are also different from each other. When the inertia adjustment coefficient is too large, the running state of the traditional algorithm particles overcorrects the next movement of the particles. is makes the overall inertia factor larger and greatly reduces the particle's local search ability, which leads to a decrease in the accuracy of the algorithm.
Based on the above analysis, we can conclude that the accuracy of the improved clustering algorithm has increased by 31.4%. Its optimal clustering ability has increased by 20.7%, and the company's internal audit resources have been improved by 17.4%. It can be seen that while the improved algorithm has greatly improved its performance, it also has a greater role in promoting the company's internal audit.

Conclusions
e article mainly studies the improvement in the company's internal audit issues. It uses data mining technology to conduct a fusion analysis of big data, so that the company's internal audit work can be better improved. First, it conducts an analysis on the key algorithm clustering algorithm of data mining technology and optimizes and improves it. is makes the improved algorithm better suitable for handling the company's internal audit issues. And in the experiment and analysis part, a comparative analysis of audit resource integration and algorithm performance is carried out, and it is concluded that the improvement in the algorithm has a very good effect.

Data Availability
No data were used to support this study.